π Harden DI container string type resolution: remove type builtin, consistent error handling#238
Conversation
β¦ly (#227) * Refactor: Remove bare `except Exception:` suppressing errors silently Modifies `_resolve_indexer_from_container` in `src/codeweaver/server/agent_api/search/__init__.py` to capture exceptions as `e` and log them using `logger.warning`. This improves observability and code health by preventing silent failures without changing the return logic. Co-authored-by: bashandbone <89049923+bashandbone@users.noreply.github.com> * Potential fix for pull request finding Co-authored-by: Copilot Autofix powered by AI <175728472+Copilot@users.noreply.github.com> Signed-off-by: Adam Poulemanos <89049923+bashandbone@users.noreply.github.com> --------- Signed-off-by: Adam Poulemanos <89049923+bashandbone@users.noreply.github.com> Co-authored-by: google-labs-jules[bot] <161369871+google-labs-jules[bot]@users.noreply.github.com> Co-authored-by: Copilot Autofix powered by AI <175728472+Copilot@users.noreply.github.com>
The `combined_lifespan` assignment and associated comment in `src/codeweaver/server/lifespan.py` were unused dead code that provided a deprecated alias. Removing this improves code health and maintainability by keeping the module surface smaller and less confusing. All tests pass with no regressions. Co-authored-by: google-labs-jules[bot] <161369871+google-labs-jules[bot]@users.noreply.github.com>
β¦imports and sys.path (#214) * Initial plan * Fix mteb_to_codeweaver.py: add KNOWN_ALIASES, fix init block, fix imports and sys.path * Apply suggestions from code review Co-authored-by: Copilot Autofix powered by AI <223894421+github-code-quality[bot]@users.noreply.github.com> Signed-off-by: Adam Poulemanos <89049923+bashandbone@users.noreply.github.com> * fix: .gitignore --------- Signed-off-by: Adam Poulemanos <89049923+bashandbone@users.noreply.github.com> Co-authored-by: copilot-swe-agent[bot] <198982749+Copilot@users.noreply.github.com> Co-authored-by: Adam Poulemanos <89049923+bashandbone@users.noreply.github.com> Co-authored-by: Copilot Autofix powered by AI <223894421+github-code-quality[bot]@users.noreply.github.com> Co-authored-by: Adam Poulemanos <adam@knit.li>
Updates test_error_recovery.py and test_health_monitoring.py to align with the BaseEnum interface by using `.variable` on CircuitBreakerState enum entries instead of `.value` or string literals. Also removed the `# FIX:` comments regarding this change. Co-authored-by: google-labs-jules[bot] <161369871+google-labs-jules[bot]@users.noreply.github.com>
* perf: use tuple instead of list for enum membership check Co-authored-by: bashandbone <89049923+bashandbone@users.noreply.github.com> * fix(di): correctly resolve Union[None] type to None Co-authored-by: bashandbone <89049923+bashandbone@users.noreply.github.com> * fix(di): properly handle NoneType inside Union checking Co-authored-by: bashandbone <89049923+bashandbone@users.noreply.github.com> --------- Co-authored-by: google-labs-jules[bot] <161369871+google-labs-jules[bot]@users.noreply.github.com>
* perf: optimize membership check in _check_profile by using set instead of list Co-authored-by: bashandbone <89049923+bashandbone@users.noreply.github.com> * ci: add Copilot to allowed_bots in claude-code-action Co-authored-by: bashandbone <89049923+bashandbone@users.noreply.github.com> * Potential fix for pull request finding Co-authored-by: Copilot Autofix powered by AI <175728472+Copilot@users.noreply.github.com> Signed-off-by: Adam Poulemanos <89049923+bashandbone@users.noreply.github.com> --------- Signed-off-by: Adam Poulemanos <89049923+bashandbone@users.noreply.github.com> Co-authored-by: google-labs-jules[bot] <161369871+google-labs-jules[bot]@users.noreply.github.com> Co-authored-by: Copilot Autofix powered by AI <175728472+Copilot@users.noreply.github.com>
* test(core): add unit tests for DiscoveredFile.absolute_path Co-authored-by: bashandbone <89049923+bashandbone@users.noreply.github.com> * fix(ci): add allowed_bots and allowed_non_write_users to PR review comment job Co-authored-by: bashandbone <89049923+bashandbone@users.noreply.github.com> * fix(ci): add allowed_bots and allowed_non_write_users to PR review comment job Co-authored-by: bashandbone <89049923+bashandbone@users.noreply.github.com> * fix(ci): add allowed_bots to Issue Assigned job Co-authored-by: bashandbone <89049923+bashandbone@users.noreply.github.com> * fix(ci): add allowed bots to all jobs Co-authored-by: bashandbone <89049923+bashandbone@users.noreply.github.com> * fix: handle missing optional imports safely in Python 3.14t and 3.13t Co-authored-by: bashandbone <89049923+bashandbone@users.noreply.github.com> * fix: correctly handle ImportError in Python 3.14t for `uuid_extensions` Co-authored-by: bashandbone <89049923+bashandbone@users.noreply.github.com> --------- Signed-off-by: Adam Poulemanos <89049923+bashandbone@users.noreply.github.com> Co-authored-by: google-labs-jules[bot] <161369871+google-labs-jules[bot]@users.noreply.github.com>
* perf: Optimize membership check for HTML block tags by using set literal Co-authored-by: bashandbone <89049923+bashandbone@users.noreply.github.com> * fix(di): allow type(None) in `_get_signature_and_hints` on Python 3.13 Co-authored-by: bashandbone <89049923+bashandbone@users.noreply.github.com> * fix: restore duckduckgo types and handle telemetry safely Co-authored-by: bashandbone <89049923+bashandbone@users.noreply.github.com> * perf: Optimize membership check for HTML block tags by using set literal Co-authored-by: bashandbone <89049923+bashandbone@users.noreply.github.com> * Potential fix for pull request finding Co-authored-by: Copilot Autofix powered by AI <175728472+Copilot@users.noreply.github.com> Signed-off-by: Adam Poulemanos <89049923+bashandbone@users.noreply.github.com> --------- Signed-off-by: Adam Poulemanos <89049923+bashandbone@users.noreply.github.com> Co-authored-by: google-labs-jules[bot] <161369871+google-labs-jules[bot]@users.noreply.github.com> Co-authored-by: Copilot Autofix powered by AI <175728472+Copilot@users.noreply.github.com>
* tests: fix mock_provider_lazy configuration in client factory integration tests Added missing `__name__` and `return_value` assignments to `mock_provider_lazy` in Qdrant provider integration tests. This ensures the mocks are consistent with how they are used by the provider factory and matches the pattern used for other providers in the same test file. Co-authored-by: bashandbone <89049923+bashandbone@users.noreply.github.com> * refactor: consolidate mock_provider_lazy setup and improve comments - Implemented `make_lazy_provider_mock` helper to centralize mock configuration. - Replaced brittle line-number comments with behavioral descriptions. - Updated all Voyage and Qdrant provider tests to use the new helper. Co-authored-by: bashandbone <89049923+bashandbone@users.noreply.github.com> --------- Co-authored-by: google-labs-jules[bot] <161369871+google-labs-jules[bot]@users.noreply.github.com>
Changed list membership check to a set to leverage Python's constant-folded frozenset optimization. This provides O(1) lookup time instead of O(n) and avoids list creation at runtime. Co-authored-by: google-labs-jules[bot] <161369871+google-labs-jules[bot]@users.noreply.github.com>
* test: add coverage for force shutdown conditional branches Added comprehensive unit tests for `_setup_signal_handler` in `src/codeweaver/main.py` to cover conditional logic within the `force_shutdown_handler` closure. The tests verify correct behavior for initial and subsequent SIGINT interruptions, as well as standard error handling upon signal registration failure. Co-authored-by: bashandbone <89049923+bashandbone@users.noreply.github.com> * test: add coverage for force shutdown conditional branches Added comprehensive unit tests for `_setup_signal_handler` in `src/codeweaver/main.py` to cover conditional logic within the `force_shutdown_handler` closure. The tests verify correct behavior for initial and subsequent SIGINT interruptions, as well as standard error handling upon signal registration failure. Co-authored-by: bashandbone <89049923+bashandbone@users.noreply.github.com> * test: add coverage for force shutdown conditional branches Added comprehensive unit tests for `_setup_signal_handler` in `src/codeweaver/main.py` to cover conditional logic within the `force_shutdown_handler` closure. The tests verify correct behavior for initial and subsequent SIGINT interruptions, as well as standard error handling upon signal registration failure. Co-authored-by: bashandbone <89049923+bashandbone@users.noreply.github.com> * test: add coverage for force shutdown conditional branches Added comprehensive unit tests for `_setup_signal_handler` in `src/codeweaver/main.py` to cover conditional logic within the `force_shutdown_handler` closure. The tests verify correct behavior for initial and subsequent SIGINT interruptions, as well as standard error handling upon signal registration failure. Co-authored-by: bashandbone <89049923+bashandbone@users.noreply.github.com> * Potential fix for pull request finding Co-authored-by: Copilot Autofix powered by AI <175728472+Copilot@users.noreply.github.com> Signed-off-by: Adam Poulemanos <89049923+bashandbone@users.noreply.github.com> * test: add coverage for force shutdown conditional branches Added comprehensive unit tests for `_setup_signal_handler` in `src/codeweaver/main.py` to cover conditional logic within the `force_shutdown_handler` closure. The tests verify correct behavior for initial and subsequent SIGINT interruptions, as well as standard error handling upon signal registration failure. Also added Copilot to allowed_bots and allowed_non_write_users in `.github/workflows/claude.yml` to prevent CI failures. Co-authored-by: bashandbone <89049923+bashandbone@users.noreply.github.com> * test: add coverage for force shutdown conditional branches Added comprehensive unit tests for `_setup_signal_handler` in `src/codeweaver/main.py` to cover conditional logic within the `force_shutdown_handler` closure. The tests verify correct behavior for initial and subsequent SIGINT interruptions, as well as standard error handling upon signal registration failure. Also fixed Copilot permissions in `.github/workflows/claude.yml` to resolve CI failure. Co-authored-by: bashandbone <89049923+bashandbone@users.noreply.github.com> * test: add coverage for force shutdown conditional branches Added comprehensive unit tests for `_setup_signal_handler` in `src/codeweaver/main.py` to cover conditional logic within the `force_shutdown_handler` closure. The tests verify correct behavior for initial and subsequent SIGINT interruptions, as well as standard error handling upon signal registration failure. Also fixed Copilot permissions in `.github/workflows/claude.yml` to resolve CI failure. Co-authored-by: bashandbone <89049923+bashandbone@users.noreply.github.com> * test: add coverage for force shutdown conditional branches Added comprehensive unit tests for `_setup_signal_handler` in `src/codeweaver/main.py` to cover conditional logic within the `force_shutdown_handler` closure. The tests verify correct behavior for initial and subsequent SIGINT interruptions, as well as standard error handling upon signal registration failure. Co-authored-by: bashandbone <89049923+bashandbone@users.noreply.github.com> --------- Signed-off-by: Adam Poulemanos <89049923+bashandbone@users.noreply.github.com> Co-authored-by: google-labs-jules[bot] <161369871+google-labs-jules[bot]@users.noreply.github.com> Co-authored-by: Copilot Autofix powered by AI <175728472+Copilot@users.noreply.github.com>
* test: Add tests for `get_version` function Added comprehensive unit tests for `get_version` in `tests/unit/test_init.py`. The tests verify that all mechanisms to retrieve the package version function as intended: - Reading `codeweaver._version.__version__` - Using `importlib.metadata.version` - Falling back to `git describe` - Handling missing `git` command - Handling broader unexpected exceptions All fallback scenarios now properly ensure the code returns the default `0.0.0` string or the valid version. No functional code was changed, only new tests were introduced. Co-authored-by: bashandbone <89049923+bashandbone@users.noreply.github.com> * Apply suggestions from code review Co-authored-by: Copilot Autofix powered by AI <175728472+Copilot@users.noreply.github.com> Co-authored-by: Copilot Autofix powered by AI <223894421+github-code-quality[bot]@users.noreply.github.com> Signed-off-by: Adam Poulemanos <89049923+bashandbone@users.noreply.github.com> * test: Add tests for `get_version` function Added unit tests for `get_version` in `tests/unit/test_init.py`. The tests verify that all mechanisms to retrieve the package version function as intended: - Reading `codeweaver._version.__version__` - Using `importlib.metadata.version` - Falling back to `git describe` - Handling missing `git` command - Handling broader unexpected exceptions Uses `monkeypatch` safely by keeping original references to avoid polluting the global namespace and Pytest internal packages. No functional code changes. Co-authored-by: bashandbone <89049923+bashandbone@users.noreply.github.com> * test: Add tests for `get_version` function Added comprehensive unit tests for `get_version` in `tests/unit/test_init.py`. The tests verify that all mechanisms to retrieve the package version function as intended: - Reading `codeweaver._version.__version__` - Using `importlib.metadata.version` - Falling back to `git describe` - Handling missing `git` command - Handling broader unexpected exceptions Uses `monkeypatch` safely by keeping original references to avoid polluting the global namespace and Pytest internal packages. No functional code changes. Co-authored-by: bashandbone <89049923+bashandbone@users.noreply.github.com> * test: Add tests for `get_version` function Added comprehensive unit tests for `get_version` in `tests/unit/test_init.py`. The tests verify that all mechanisms to retrieve the package version function as intended: - Reading `codeweaver._version.__version__` - Using `importlib.metadata.version` - Falling back to `git describe` - Handling missing `git` command - Handling broader unexpected exceptions Uses `monkeypatch` safely by keeping original references to avoid polluting the global namespace and Pytest internal packages. No functional code changes. Co-authored-by: bashandbone <89049923+bashandbone@users.noreply.github.com> * test: add robust coverage for get_version fallbacks Adds comprehensive tests for `codeweaver.get_version()` in `tests/unit/test_init.py`. Properly mocks the module import mechanisms, importlib.metadata, and git subprocess calls to ensure all fallback paths (file -> metadata -> git -> default) are tested. Addresses PR feedback by isolating the `_original_version` scope and adding explanatory comments about module reloading requirements during tests. Removes `test_investigate.py`. Co-authored-by: bashandbone <89049923+bashandbone@users.noreply.github.com> * fix: spdx information --------- Signed-off-by: Adam Poulemanos <89049923+bashandbone@users.noreply.github.com> Co-authored-by: google-labs-jules[bot] <161369871+google-labs-jules[bot]@users.noreply.github.com> Co-authored-by: Copilot Autofix powered by AI <175728472+Copilot@users.noreply.github.com> Co-authored-by: Copilot Autofix powered by AI <223894421+github-code-quality[bot]@users.noreply.github.com>
* π Replace insecure pickle with JSON for node types cache Mitigate insecure deserialization vulnerability by switching the tree-sitter node types cache from pickle to JSON. - Updated scripts/build/preprocess-node-types.py to serialize cache to JSON. - Updated src/codeweaver/semantic/node_type_parser.py to load and validate the JSON cache using pydantic.TypeAdapter. - Corrected build and CI artifact paths in mise.dev.toml. - Updated documentation in src/codeweaver/semantic/data/__init__.py. Co-authored-by: bashandbone <89049923+bashandbone@users.noreply.github.com> * π Switch node types cache to JSON and fix CI issues - Switch tree-sitter node types cache from pickle to JSON to mitigate insecure deserialization (S301). - Use Pydantic's TypeAdapter for safe validation and loading. - Fix CI failures on Python 3.14 by using the project's internal uuid7 utility instead of uuid_extensions. - Clean up classification_result during cache loading to ensure fresh recomputation. - Correct artifact paths in mise.dev.toml and documentation. Co-authored-by: bashandbone <89049923+bashandbone@users.noreply.github.com> * Potential fix for pull request finding 'Unused import' Co-authored-by: Copilot Autofix powered by AI <223894421+github-code-quality[bot]@users.noreply.github.com> Signed-off-by: Adam Poulemanos <89049923+bashandbone@users.noreply.github.com> * Fix JSON node types cache: commit missing file and repair union deserialization (#235) * Initial plan * Fix JSON cache loading: resolve connection union, fix type errors, extract helper methods, add cache file Co-authored-by: bashandbone <89049923+bashandbone@users.noreply.github.com> * Address code review: clarify noqa comment, improve circular-import doc, simplify adapter guard Co-authored-by: bashandbone <89049923+bashandbone@users.noreply.github.com> * Potential fix for pull request finding Co-authored-by: Copilot Autofix powered by AI <175728472+Copilot@users.noreply.github.com> Signed-off-by: Adam Poulemanos <89049923+bashandbone@users.noreply.github.com> * Potential fix for pull request finding Co-authored-by: Copilot Autofix powered by AI <175728472+Copilot@users.noreply.github.com> Signed-off-by: Adam Poulemanos <89049923+bashandbone@users.noreply.github.com> --------- Signed-off-by: Adam Poulemanos <89049923+bashandbone@users.noreply.github.com> Co-authored-by: copilot-swe-agent[bot] <198982749+Copilot@users.noreply.github.com> Co-authored-by: bashandbone <89049923+bashandbone@users.noreply.github.com> Co-authored-by: Copilot Autofix powered by AI <175728472+Copilot@users.noreply.github.com> --------- Signed-off-by: Adam Poulemanos <89049923+bashandbone@users.noreply.github.com> Co-authored-by: google-labs-jules[bot] <161369871+google-labs-jules[bot]@users.noreply.github.com> Co-authored-by: Copilot Autofix powered by AI <223894421+github-code-quality[bot]@users.noreply.github.com> Co-authored-by: Copilot <198982749+Copilot@users.noreply.github.com> Co-authored-by: Copilot Autofix powered by AI <175728472+Copilot@users.noreply.github.com>
β¦, consistent None returns, test improvements Co-authored-by: bashandbone <89049923+bashandbone@users.noreply.github.com>
β¦, consistent None returns, test improvements Co-authored-by: bashandbone <89049923+bashandbone@users.noreply.github.com>
β¦, consistent None returns, test improvements, fix pre-existing lint issues Co-authored-by: bashandbone <89049923+bashandbone@users.noreply.github.com>
type builtin, consistent error handling
There was a problem hiding this comment.
Pull request overview
This PR primarily hardens CodeWeaverβs DI container string-type resolution to reduce the eval attack surface and make invalid-type handling consistent, while also including a semantic node-types cache migration (pickle β JSON), dependency bumps, and various lint/test cleanups.
Changes:
- Hardened
_safe_eval_typein the DI container (removetypebuiltin, unify invalid-input behavior to returnNone, and handleNoneTyperesolution). - Migrated semantic node-types cache loading/generation from pickle to JSON with Pydantic validation.
- Updated dependencies/locks and applied assorted test/lint refactors and reliability tweaks across providers, scripts, and tests.
Reviewed changes
Copilot reviewed 44 out of 51 changed files in this pull request and generated 7 comments.
Show a summary per file
| File | Description |
|---|---|
| uv.lock | Updates locked dependency versions (multiple provider/client libs). |
| pyproject.toml | Updates runtime dependency constraints (e.g., cyclopts, rich, openai, qdrant-client, huggingface-hub). |
| mise.toml | Adds hk tool version to dev toolchain. |
| mise.dev.toml | Updates build artifact checks/add steps for JSON node-types cache path. |
| hk.pkl | Raises allowed max size for added files (supports large generated artifacts). |
| .gitignore | Adjusts ignored directories/files (exportify config handling). |
| .github/workflows/claude.yml | Expands allowed bot/non-write users for the Claude workflow. |
| src/codeweaver/core/di/container.py | Hardens type-string eval and normalizes invalid type handling; adds NoneType resolution. |
| tests/di/test_container_security.py | Aligns with unit-test conventions and adds regression test blocking type(...) exploitation. |
| src/codeweaver/semantic/node_type_parser.py | Switches node-types cache loader from pickle to JSON + TypeAdapter validation and reconstruction. |
| src/codeweaver/semantic/data/init.py | Updates package docs to reference JSON cache artifact. |
| src/codeweaver/semantic/data/node_types_cache.json | New/updated generated node-types JSON cache artifact. |
| src/codeweaver/semantic/data/node_types_cache.json.license | Adds SPDX license sidecar for generated JSON cache. |
| scripts/build/preprocess-node-types.py | Generates JSON cache instead of pickle during build preprocessing. |
| src/codeweaver/semantic/ast_grep.py | Fixes has_package usage to align with its boolean contract. |
| src/codeweaver/providers/vector_stores/inmemory.py | Ensures persistence parent directory exists before creating Qdrant client. |
| src/codeweaver/providers/reranking/providers/base.py | Minor style/robustness tweak (tuple membership check). |
| src/codeweaver/providers/embedding/capabilities/base.py | Fixes has_package boolean usage. |
| src/codeweaver/providers/config/providers.py | Fixes has_package boolean usage for default provider selection. |
| src/codeweaver/providers/config/profiles.py | Adjusts default data-provider selection based on ddgs availability. |
| src/codeweaver/providers/config/clients/multi.py | Improves optional imports and renames FastEmbed ONNX providers field with aliasing. |
| src/codeweaver/providers/config/clients/agent.py | Removes stale type-ignore and fixes has_package boolean usage in TYPE_CHECKING block. |
| src/codeweaver/providers/config/backup_models.py | Fixes fastembed/sentence-transformers availability checks to use boolean has_package. |
| src/codeweaver/engine/chunker/delimiters/custom.py | Replaces inline HTML block tag list with a constant frozenset. |
| src/codeweaver/core/utils/generation.py | Refactors UUID7 generator import behavior (but introduces a regression in timestamp parsing). |
| src/codeweaver/core/types/service_cards.py | Updates DuckDuckGo tool import path and related comment. |
| src/codeweaver/core/statistics.py | Refines profile comparison logic (avoids list membership). |
| src/codeweaver/cli/commands/index.py | Uses set membership for confirmation checks. |
| src/codeweaver/server/agent_api/search/init.py | Improves warning logs with exception context (exc_info=True). |
| src/codeweaver/server/lifespan.py | Removes deprecated backward-compatibility alias. |
| tests/unit/test_main.py | Adds new unit tests for signal handler behavior (needs unit-test lint alignment). |
| tests/unit/test_init.py | Adds new unit tests for version resolution logic (needs unit-test lint alignment). |
| tests/unit/core/test_discovery.py | Adds unit tests for DiscoveredFile path resolution behavior. |
| tests/unit/core/telemetry/test_privacy.py | Marks test class with pytest.mark.unit. |
| tests/unit/config/test_versioned_profile.py | Adds pytest.mark.unit to test classes. |
| tests/unit/cli/test_httpx_lazy_import.py | Adds additional pytest markers to the test class. |
| tests/unit/core/types/test_embedding_batch_info_intent.py | Switches UUID7 generator import to internal utility. |
| tests/unit/core/types/test_chunk_embeddings_properties.py | Switches UUID7 generator import to internal utility. |
| tests/integration/conftest.py | Fixes has_package boolean usage for feature flags. |
| tests/integration/chunker/config/test_client_factory_integration.py | Adds helper for lazy provider mocks to reduce duplication. |
| tests/integration/server/test_health_monitoring.py | Uses CircuitBreakerState.*.variable instead of hardcoded strings. |
| tests/integration/server/test_error_recovery.py | Uses CircuitBreakerState.*.variable instead of .value. |
| tests/integration/providers/env_registry/test_definitions.py | Adds pytest.mark.integration to multiple test classes. |
| scripts/model_data/mteb_to_codeweaver.py | Refactors script imports/types and path setup (but currently imports Provider from a non-existent module). |
| scripts/model_data/secondary_providers.json | Adds model lists for secondary providers. |
| scripts/model_data/secondary_providers.json.license | Adds SPDX license for secondary providers JSON. |
| scripts/model_data/hf-models.json | Updates HF model metadata cache (but currently malformed JSON). |
| scripts/model_data/hf-models.json.license | Adds SPDX license for hf-models.json. |
| scripts/build/generate-mcp-server-json.py | Minor import ordering adjustment. |
| docs/plans/2026-02-24-provider-config-consolidation.md | Updates has_package examples to boolean usage. |
π‘ Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| ] | ||
| } | ||
| }, | ||
| "models": {}, |
| { | ||
| "models": {} | ||
| } No newline at end of file |
| import signal | ||
|
|
||
| from unittest.mock import patch | ||
|
|
||
| import pytest | ||
|
|
||
| from codeweaver.main import _setup_signal_handler | ||
|
|
||
|
|
||
| def test_setup_signal_handler_first_interrupt(): | ||
| """Test that the first interrupt raises KeyboardInterrupt.""" |
| # SPDX-FileCopyrightText: 2026 Knitli Inc. | ||
| # | ||
| # SPDX-License-Identifier: MIT OR Apache-2.0 | ||
|
|
||
| import importlib.metadata | ||
| import shutil | ||
| import subprocess | ||
| import sys | ||
|
|
||
| from unittest.mock import MagicMock | ||
|
|
||
| import pytest | ||
|
|
||
| import codeweaver | ||
|
|
||
|
|
||
| @pytest.fixture | ||
| def mock_no_version_file(monkeypatch): | ||
| """Remove _version module from sys.modules and block re-import from disk. | ||
|
|
|
|
||
| from codeweaver.core import Provider | ||
| from codeweaver.providers import PartialCapabilities | ||
| from codeweaver.providers.provider import Provider |
| def _safe_eval_type(self, type_str: str, globalns: dict[str, Any]) -> Any | None: | ||
| """Safely evaluate a type string using AST validation. | ||
|
|
||
| Parses the type string into an AST, validates that it contains only safe |
There was a problem hiding this comment.
Agreed. This PR conflates multiple concerns which makes review difficult and rollback risky. The security fix itself is solid, but it should be separated from:
- Major dependency bumps (huggingface-hub 0.36β1.7, openai 2.17β2.28, etc.)
- PickleβJSON cache migration
- Pre-existing lint fixes
Each deserves independent review and testing. If a dependency update causes issues, rolling back would also revert the security fix.
| if sys.version_info < (3, 14): | ||
| from uuid_extensions import time_ns, uuid_to_datetime | ||
|
|
||
| return uuid_to_datetime(uuid) if as_datetime else time_ns(uuid) | ||
| from uuid import uuid7 | ||
|
|
||
| uuid = uuid7(uuid) if isinstance(uuid, str | int) else uuid | ||
| return ( |
There was a problem hiding this comment.
Security Hardening Review
I've reviewed PR #238 which hardens the DI container's string type resolution. Here's my assessment:
β Security Improvements (Excellent)
The core security changes in src/codeweaver/core/di/container.py are well-designed and effective:
-
Removed
typebuiltin (line 150-162): Correctly excludestypefrom safe_builtins to prevent dynamic class creation viatype('X', (object,), {}). The inline comment clearly explains the security rationale. -
Consistent error handling: Changed from mixed error handling (SyntaxError β None, TypeError propagated) to uniform
TypeError β Nonereturns. This is safer and more predictable. -
Updated return type: Properly changed return annotation to
Any | Nonewith matching docstring. -
Test coverage:
test_type_builtin_not_exploitable()(line 98-102) properly validates the exploit is blocked. -
Added
Nonetype handling (line 644-645): Good defensive addition for edge cases.
β οΈ Lint Fixes Review
Most lint fixes are reasonable improvements:
- Replacing
has_package(x) is not Nonewithhas_package(x)is correct (assuming has_package returns bool) - Logic fix in
backup_models.py:57: Changedandtoor- this is correct (FastEmbed available if either package exists) - Import guards with try/except in
multi.pyadd safety - Using
isfor enum comparisons instatistics.pyis good practice - FastEmbedClientOptions refactor to use
model_copy(update=...)instead ofobject.__setattr__is cleaner
π¨ Critical Issue: Invalid JSON
scripts/model_data/hf-models.json is malformed:
Line 216: "models": {},
Line 217: "models": { β Duplicate key\!
...
Line 3927: }
Line 3928: { β Missing comma, new object instead of continuation
Line 3929: "models": {}
Line 3930: }
This breaks JSON parsing. The file needs to be fixed - either:
- Remove duplicate
"models": {}entries - Add proper comma/structure between objects
- Regenerate the file if it's auto-generated
Minor Concerns
-
.gitignorechanges (line 232-233): Changed from.gemini/to.exportify/. Consider if this should be in a separate PR focused on configuration changes. -
Workflow changes: Adding
jules[bot]andcopilotto allowed users/bots - ensure this aligns with security policies for automated PR creation. -
hk.pkl: Increased max file size from 10MB to 15MB - ensure this won't allow accidentally committing large artifacts.
Recommendation
Request changes due to the malformed JSON file. The security improvements are solid, but the broken JSON will cause runtime errors and must be fixed before merging.
Once the JSON is fixed, this PR will be ready to merge. The security hardening is well-implemented and the test coverage is appropriate.
There was a problem hiding this comment.
Security Hardening Review
I've reviewed PR #238 which hardens the DI container's string type resolution. Here's my assessment:
β Security Improvements (Excellent)
The core security changes in src/codeweaver/core/di/container.py are well-designed and effective:
-
Removed
typebuiltin (line 150-162): Correctly excludestypefrom safe_builtins to prevent dynamic class creation viatype('X', (object,), {}). The inline comment clearly explains the security rationale. -
Consistent error handling: Changed from mixed error handling (SyntaxError β None, TypeError propagated) to uniform
TypeError β Nonereturns. This is safer and more predictable. -
Updated return type: Properly changed return annotation to
Any | Nonewith matching docstring. -
Test coverage:
test_type_builtin_not_exploitable()(line 98-102) properly validates the exploit is blocked. -
Added
Nonetype handling (line 644-645): Good defensive addition for edge cases.
β οΈ Lint Fixes Review
Most lint fixes are reasonable improvements:
- Replacing
has_package(x) is not Nonewithhas_package(x)is correct (assuming has_package returns bool) - Logic fix in
backup_models.py:57: Changedandtoor- this is correct (FastEmbed available if either package exists) - Import guards with try/except in
multi.pyadd safety - Using
isfor enum comparisons instatistics.pyis good practice - FastEmbedClientOptions refactor to use
model_copy(update=...)instead ofobject.__setattr__is cleaner
π¨ Critical Issue: Invalid JSON
scripts/model_data/hf-models.json is malformed:
Line 216: "models": {},
Line 217: "models": { β Duplicate key\!
...
Line 3927: }
Line 3928: { β Missing comma, new object instead of continuation
Line 3929: "models": {}
Line 3930: }
This breaks JSON parsing. The file needs to be fixed - either:
- Remove duplicate
"models": {}entries - Add proper comma/structure between objects
- Regenerate the file if it's auto-generated
Minor Concerns
-
.gitignorechanges (line 232-233): Changed from.gemini/to.exportify/. Consider if this should be in a separate PR focused on configuration changes. -
Workflow changes: Adding
jules[bot]andcopilotto allowed users/bots - ensure this aligns with security policies for automated PR creation. -
hk.pkl: Increased max file size from 10MB to 15MB - ensure this won't allow accidentally committing large artifacts.
Recommendation
Requesting changes due to the malformed JSON file. The security improvements are solid, but the broken JSON will cause runtime errors and must be fixed before merging.
Once the JSON is fixed, this PR will be ready to merge. The security hardening is well-implemented and the test coverage is appropriate.
There was a problem hiding this comment.
Code Review Summary
I've reviewed PR #238 focusing on the security hardening changes to the DI container and the broader codebase modifications. Here's my detailed analysis:
β Security Hardening (Primary Changes)
DI Container (src/codeweaver/core/di/container.py)
The security improvements to _safe_eval_type are excellent:
-
Removal of
typefrom safe_builtins (lines 148-149)- β
Correct decision -
typeis not needed for type annotation resolution - β
Prevents exploitation via
type('X', (object,), {})withast.Callnodes - β Well-documented inline comment explains the rationale
- β
Correct decision -
-
Consistent error handling (lines 141-144)
- β
Now wraps
TypeValidatorintry/except TypeErrorto returnNone - β
Matches the existing
SyntaxErrorhandling behavior - β
Method now consistently returns
Nonefor all invalid inputs - β
Updated return type annotation to
Any | None(line 87)
- β
Now wraps
-
Test coverage (
tests/di/test_container_security.py)- β
New
test_type_builtin_not_exploitableverifies the fix (lines 98-103) - β
Added proper module docstring and
from __future__ import annotations - β
Added
pytestmarkfor test categorization
- β
New
Minor improvement note: The eval() call at line 165 has a noqa: S307 comment, which is appropriate since the security is enforced through AST validation and restricted builtins.
β οΈ Concerns & Questions
1. Large scope - many unrelated changes
This PR includes 51 changed files with multiple concerns mixed together:
- Security fixes (main focus)
- Pre-existing lint fixes across multiple files
- Workflow configuration changes (
.github/workflows/claude.yml) - Dependency version updates (
pyproject.toml) - File format changes (
node_types_cache.pklβnode_types_cache.json) - Multiple lint suppressions added
Recommendation: Consider splitting this into focused PRs for easier review and rollback if needed.
2. hf-models.json malformation (lines 216, 3928-3930)
"models": {}, // Line 216 - duplicate key
"models": {
// ... 3700+ lines ...
}
{ // Line 3928 - Invalid JSON structure
"models": {}
}This file has invalid JSON with:
- Duplicate
"models"key at line 216 - Orphaned object at the end (lines 3928-3930)
Action required: This will break JSON parsing and needs immediate fix.
3. Type ignore annotations in mteb_to_codeweaver.py (lines 375-381)
type HFModelProviders = Literal[
Provider.FASTEMBED, # ty: ignore[invalid-type-form]
Provider.HUGGINGFACE_INFERENCE, # ty: ignore[invalid-type-form]
# ... etc
]While the comment explains this is because ty doesn't recognize BaseEnum subclass members, consider:
- Is this a
tybug that should be reported? - Could the type definition be restructured to avoid this?
4. FastEmbedClientOptions field aliasing (src/codeweaver/providers/config/clients/multi.py, lines 243-246)
The change from providers to onnx_providers with aliasing is good for clarity, but:
- Ensure backward compatibility if this is a user-facing API
- Consider deprecation notice if the old name was public
5. Import guards and type stubs (lines 46-65 in multi.py)
Multiple conditional imports with fallback to Any:
if has_package("fastembed") or has_package("fastembed_gpu"):
try:
from fastembed.common.types import OnnxProvider
except ImportError:
OnnxProvider = Any # type: ignore[assignment, misc]This pattern could hide import errors. Consider logging when falling back to Any.
β Positive Changes
- Consistent use of
has_package()- Many places changed fromhas_package(...) is not Noneto justhas_package(...), which is cleaner - Error handling improvements in
container.py- More defensive checks forNone - Better code organization - Added docstrings and type hints in tests
- GitHub Actions updates - Added Jules bot to allowed users/bots
π Testing Status
CI checks show:
- β CLA check passed
- β Socket security passed
- β³ Claude response pending
Recommendations
Before merging:
- Fix the
hf-models.jsonsyntax errors (critical) - Verify the
node_types_cache.jsonformat change doesn't break existing deployments - Run full test suite to ensure all 51 file changes are compatible
- Consider splitting unrelated changes into separate PRs
Security assessment: The core security fixes are sound and well-tested. The removal of type from safe builtins is the right call, and the consistent error handling is a good improvement.
Code quality: Generally good, but the PR scope is too broad for optimal review. The security fixes themselves are excellent.
Verdict
The security hardening changes are approved and ready to merge from a security perspective. However, the JSON syntax error must be fixed first, and I recommend addressing the scope concerns for future PRs.
Would you like me to prepare a fix for the hf-models.json issue?
|
I've reviewed this PR and I agree with the concern raised by Copilot about the PR scope. Review SummarySecurity Fix (DI Container) β
Scope Concern
Recommendations
The security improvements are solid, but the PR scope makes review, testing, and potential rollback more risky than necessary. |
π¨ Critical Issue Found: Invalid JSON in
|
There was a problem hiding this comment.
PR Review: Security Hardening for DI Container
Summary
This PR addresses important security vulnerabilities in the DI container's type evaluation mechanism and includes necessary test improvements and lint fixes. The changes are well-structured and thoroughly tested.
Key Security Improvements β
1. Removed type builtin from safe_builtins
- File:
src/codeweaver/core/di/container.py:148-149 - Excellent fix! The
typebuiltin was unnecessary for type resolution and could be exploited viatype('X', (object,), {})to create dynamic classes - Clear inline comment explains the exclusion rationale
- Test coverage added in
test_type_builtin_not_exploitable
2. Consistent error handling
- File:
src/codeweaver/core/di/container.py:87,141-144 TypeErrorfromTypeValidatoris now caught and returnsNone- Previously
SyntaxErrorreturnedNonebutTypeErrorpropagated - now consistent - Return type properly updated to
Any | None - Docstring accurately reflects behavior
3. Enhanced test coverage
tests/di/test_container_security.pyfollows project conventions with:- Module docstring β
from __future__ import annotationsβpytestmarkfor test categorization β- Per-function docstrings β
- Return type annotations β
Issue Found: test_main.py Linting π΄
File: tests/unit/test_main.py
Unit tests under tests/unit/** are not covered by the tests/*.py Ruff per-file ignores (see ruff.toml:179-197), so they must satisfy stricter linting rules:
Required changes:
- Add module docstring
- Add
from __future__ import annotations - Add return type annotations (
-> None) to all test functions
Suggested fix:
# SPDX headers...
"""Unit tests for signal handler setup in main module.
This module tests the signal handler registration and interrupt handling
logic in the CodeWeaver main entry point.
"""
from __future__ import annotations
import signal
from unittest.mock import patch
import pytest
from codeweaver.main import _setup_signal_handler
def test_setup_signal_handler_first_interrupt() -> None:
"""Test that the first interrupt raises KeyboardInterrupt."""
# ... rest of test
def test_setup_signal_handler_second_interrupt() -> None:
"""Test that the second interrupt exits immediately."""
# ... rest of test
def test_setup_signal_handler_suppress_errors() -> None:
"""Test that ValueError and OSError are suppressed when setting the signal."""
# ... rest of testPre-existing Lint Fixes β
The PR includes several necessary lint fixes to unblock push after rebase:
node_type_parser.py: Changedassert adapter is not Noneto proper conditional checkproviders/config/clients/agent.py: Removed stalety:ignore[invalid-method-override]providers/config/clients/multi.py: Removed unused type ignores from else branchesscripts/model_data/mteb_to_codeweaver.py: Added proper type ignore comments forLiteral[Provider.X]members
All these are appropriate fixes that improve code quality.
Additional Observations
Performance Optimizations:
src/codeweaver/cli/commands/index.py:128: Changed list to set for membership check (good optimization!)src/codeweaver/engine/chunker/delimiters/custom.py:360: Used frozenset for HTML block tags
Code Quality:
- Clear comments explaining security decisions
- Comprehensive test coverage for security-critical code
- Docstrings properly updated to reflect behavior
Recommendation
Request Changes - The PR needs the test_main.py linting issue fixed before merge. Once that's addressed, this will be ready to merge.
Testing
- Security tests are comprehensive and well-written
- Test coverage for the
typebuiltin exclusion is excellent - All tests follow project conventions (except
test_main.py)
Great work on the security hardening! π
There was a problem hiding this comment.
Security Fix: Excellent β
The core security improvements are well-implemented:
Primary Security Changes
- β
Removal of
typefrom safe_builtins (line 148-149): Critical fix that preventstype('X', (object,), {})exploitation. The inline comment clearly explains the rationale. - β
Consistent error handling (line 141-144):
TypeValidatornow raisesTypeError(wrapped to returnNone), making all error paths consistent. Good defensive programming. - β
Test coverage:
test_type_builtin_not_exploitablespecifically validates the security fix. Well done.
Security Assessment: The vulnerability is properly mitigated. The fix correctly identifies that type() is unnecessary for type annotation resolution and represents an exploitable surface when combined with allowed ast.Call nodes.
Critical Issue: Malformed JSON β
// scripts/model_data/hf-models.json line 216
},
"models": {}, // β Duplicate key!
"models": {This file has:
- Duplicate
"models": {}key at line 216 - Malformed closing structure (line 3929-3930)
This will break JSON parsing and must be fixed before merge.
Code Quality Observations
Good Changes β
- multi.py: Using
model_copy(update=...)instead ofobject.__setattr__is proper Pydantic practice - multi.py: Proper fallbacks for conditional imports (
OnnxProvider,ModuleβAny) - Pickle β JSON cache migration: Improves security (JSON can't execute code like pickle)
- Test formatting: Added proper module conventions for unit tests
Questions β οΈ
-
Dependency bumps (pyproject.toml): Have these been tested for compatibility? Notable changes:
huggingface-hub: 0.36.2 β >=1.7.1 (major version jump)openai: 2.17.0 β 2.28.0pydantic-ai-slim: >=1.56.0 β >=1.68.0
-
File size limit increase (hk.pkl): 10MB β 15MB - what's driving this need?
-
Scope: This PR mixes security fixes, cache format migration, lint fixes, and dependency updates. Consider splitting for easier review/rollback.
Recommendation
Request Changes solely due to the JSON syntax error. Once that's fixed, the security improvements are solid and should merge.
The core changes to container.py:87-165 and test_container_security.py:98-103 are excellent defensive security work. π
There was a problem hiding this comment.
Code Review - Security Hardening PR
I've reviewed this pull request, focusing on the security improvements and code quality. Here's my detailed assessment:
β Excellent Security Improvements
The core security changes in src/codeweaver/core/di/container.py are well-designed and effective:
-
Removed
typefrom safe_builtins (container.py:148-149)- β
Correctly prevents exploitation via
type('X', (object,), {}) - β Well-documented with clear inline comment explaining the security rationale
- β Not needed for legitimate type annotation resolution
- β
Correctly prevents exploitation via
-
Consistent error handling (container.py:141-144)
- β
Changed from mixed behavior (SyntaxError β None, TypeError propagated) to uniform
TypeError β None - β
Updated return type annotation to
Any | Nonewith matching docstring - β More predictable and safer behavior
- β
Changed from mixed behavior (SyntaxError β None, TypeError propagated) to uniform
-
Added NoneType handling (container.py:644-645)
- β Good defensive addition for edge case resolution
-
Test coverage (tests/di/test_container_security.py)
- β
New
test_type_builtin_not_exploitablevalidates the exploit is blocked - β Added proper pytest markers and module docstring
- β Follows project testing conventions
- β
New
π¨ Critical Issues That Must Be Fixed
1. Invalid JSON syntax in hf-models.json
The file has two critical syntax errors:
Line 216-217: Duplicate "models" key
},
"models": {}, β First declaration (empty)
"models": { β Duplicate key - invalid JSON\!
"Alibaba-NLP/gte-modernbert-base": {Line 3927-3929: Orphaned object at end of file
}
} β Should be end of file
{ β Missing comma / invalid structure
"models": {}
}Impact: This will cause JSON parsing to fail at runtime.
Fix: Remove the duplicate empty "models": {} on line 216 and the orphaned object at the end (lines 3928-3929).
2. Incorrect import path in mteb_to_codeweaver.py
Line 52: Imports from non-existent module
from codeweaver.providers.provider import ProviderIssue: The module codeweaver.providers.provider does not exist.
Correct import: Based on the codebase structure, Provider is defined in codeweaver.core.types.provider and re-exported via codeweaver.core:
from codeweaver.core import ProviderEvidence:
Providerclass is defined in:src/codeweaver/core/types/provider.py:252- It's exported from:
src/codeweaver/core/__init__.py(line 487, 960) - No file exists at:
src/codeweaver/providers/provider.py
Impact: This script will fail with ModuleNotFoundError when executed.
β οΈ Additional Observations
Lint Fixes (Generally Good)
- β
has_package(x) is not Noneβhas_package(x)- correct simplification - β
backup_models.py:57changedandtoor- correct logic (FastEmbed available if either package exists) - β
Import guards with try/except in
multi.pyadd robustness - β
Using
isfor enum comparisons instatistics.py- best practice - β
FastEmbedClientOptions refactor to use
model_copy(update=...)instead ofobject.__setattr__- cleaner
Scope Concerns
This PR includes 51 changed files with multiple unrelated concerns:
- Security fixes (primary focus) β
- Pickle β JSON migration for node types cache
- Dependency version updates
- Workflow configuration changes
- Multiple lint fixes across the codebase
While all changes appear reasonable, this broad scope makes review and potential rollback more challenging.
Type Annotations (mteb_to_codeweaver.py:375-381)
Multiple # ty: ignore[invalid-type-form] annotations added for Provider enum members in Literal types. The comment explains ty doesn't recognize BaseEnum subclass members - this is acceptable but consider reporting to the ty project.
π Recommendations
Before merging:
- FIX the malformed JSON in
hf-models.json(critical) - FIX the incorrect import in
mteb_to_codeweaver.pyline 52 (critical) - VERIFY the node_types_cache format migration doesn't break existing deployments
- RUN full test suite to ensure compatibility across all 51 changed files
For future PRs:
Consider splitting large multi-concern PRs into focused changes for easier review and safer rollback.
π― Verdict
Security improvements: β Excellent - well-designed, properly tested, ready to merge
Blocking issues: β Two critical bugs must be fixed first:
- Invalid JSON syntax
- Incorrect module import path
Once these two issues are resolved, this PR will be ready to merge. The security hardening is solid and the test coverage is appropriate.
Response to Review Comment on test_main.py@copilot You're absolutely correct about the linting requirements! The Required changes for
Here's the corrected version that should pass linting: # SPDX-FileCopyrightText: 2025 Knitli Inc.
# SPDX-FileContributor: Adam Poulemanos <adam@knit.li>
#
# SPDX-License-Identifier: MIT OR Apache-2.0
"""Unit tests for signal handler setup in main module.
This module tests the signal handler registration and interrupt handling
logic in the CodeWeaver main entry point.
"""
from __future__ import annotations
import signal
from unittest.mock import patch
import pytest
from codeweaver.main import _setup_signal_handler
def test_setup_signal_handler_first_interrupt() -> None:
"""Test that the first interrupt raises KeyboardInterrupt."""
with patch("signal.signal") as mock_signal:
_setup_signal_handler()
# Get the registered handler
mock_signal.assert_called_with(signal.SIGINT, mock_signal.call_args[0][1])
force_shutdown_handler = mock_signal.call_args[0][1]
# Test first call raises KeyboardInterrupt
with pytest.raises(KeyboardInterrupt):
force_shutdown_handler(signal.SIGINT, None)
def test_setup_signal_handler_second_interrupt() -> None:
"""Test that the second interrupt exits immediately."""
with patch("signal.signal") as mock_signal:
_setup_signal_handler()
force_shutdown_handler = mock_signal.call_args[0][1]
# First call raises KeyboardInterrupt
with pytest.raises(KeyboardInterrupt):
force_shutdown_handler(signal.SIGINT, None)
# Second call exits with 1
with patch("sys.exit") as mock_exit, patch("codeweaver.main.logger.warning") as mock_warning:
force_shutdown_handler(signal.SIGINT, None)
mock_warning.assert_called_with("Force shutdown requested, exiting immediately...")
mock_exit.assert_called_with(1)
def test_setup_signal_handler_suppress_errors() -> None:
"""Test that ValueError and OSError are suppressed when setting the signal."""
# Test ValueError
with patch("signal.signal", side_effect=ValueError):
original_handler = _setup_signal_handler()
assert original_handler is None
# Test OSError
with patch("signal.signal", side_effect=OSError):
original_handler = _setup_signal_handler()
assert original_handler is NoneThis should resolve the linting issues and make the tests compliant with the project's unit test standards. Once this is fixed, the PR will be ready to merge! π |
There was a problem hiding this comment.
Security Fix: Excellent β
The core security improvements are well-implemented:
Primary Security Changes
- β
Removal of
typefrom safe_builtins (line 148-149): Critical fix that preventstype('X', (object,), {})exploitation. The inline comment clearly explains the rationale. - β
Consistent error handling (line 141-144):
TypeValidatornow raisesTypeError(wrapped to returnNone), making all error paths consistent. Good defensive programming. - β
Test coverage:
test_type_builtin_not_exploitablespecifically validates the security fix. Well done.
Security Assessment: The vulnerability is properly mitigated. The fix correctly identifies that type() is unnecessary for type annotation resolution and represents an exploitable surface when combined with allowed ast.Call nodes.
Critical Issue: Malformed JSON β
// scripts/model_data/hf-models.json line 216
},
"models": {}, // β Duplicate key!
"models": {This file has:
- Duplicate
"models": {}key at line 216 - Malformed closing structure (line 3929-3930)
This will break JSON parsing and must be fixed before merge.
Code Quality Observations
Good Changes β
- multi.py: Using
model_copy(update=...)instead ofobject.__setattr__is proper Pydantic practice - multi.py: Proper fallbacks for conditional imports (
OnnxProvider,ModuleβAny) - Pickle β JSON cache migration: Improves security (JSON can't execute code like pickle)
- Test formatting: Added proper module conventions for unit tests
Questions β οΈ
-
Dependency bumps (pyproject.toml): Have these been tested for compatibility? Notable changes:
huggingface-hub: 0.36.2 β >=1.7.1 (major version jump)openai: 2.17.0 β 2.28.0pydantic-ai-slim: >=1.56.0 β >=1.68.0
-
File size limit increase (hk.pkl): 10MB β 15MB - what's driving this need?
-
Scope: This PR mixes security fixes, cache format migration, lint fixes, and dependency updates. Consider splitting for easier review/rollback.
Recommendation
Request Changes solely due to the JSON syntax error. Once that's fixed, the security improvements are solid and should merge.
The core changes to container.py:87-165 and test_container_security.py:98-103 are excellent defensive security work. π
There was a problem hiding this comment.
Review Summary - Requesting Changes
I've thoroughly reviewed this security hardening PR. While the core security improvements are excellent, there are two critical bugs that must be fixed before merging.
π¨ Critical Issues (Must Fix)
1. Invalid Import Path in scripts/model_data/mteb_to_codeweaver.py
File: scripts/model_data/mteb_to_codeweaver.py:52
Current (incorrect):
from codeweaver.providers.provider import ProviderIssue: The module codeweaver.providers.provider does not exist in the codebase. This script will fail with ModuleNotFoundError when executed.
Required fix:
from codeweaver.core import ProviderVerification:
Providerenum is defined in:src/codeweaver/core/types/provider.py:252- Re-exported from:
src/codeweaver/core/__init__.py(lines 487, 960) - No file exists at:
src/codeweaver/providers/provider.py
This issue was correctly identified by Copilot in comment #2944055716.
2. Malformed JSON in scripts/model_data/hf-models.json
Issue 1 - Duplicate key (lines 216-217):
},
"models": {}, β Empty models dict
"models": { β Duplicate key - INVALID JSON\!
"Alibaba-NLP/gte-modernbert-base": {Issue 2 - Orphaned object (lines 3927-3929):
]
}
} β Should be end of file
{ β Invalid structure
"models": {}
}Impact: This file will fail JSON parsing at runtime.
Required fix:
- Remove the empty
"models": {}on line 216 - Remove the orphaned object structure at the end (lines 3928-3929)
Verification:
python -m json.tool scripts/model_data/hf-models.json
# Currently fails with JSON syntax errorβ Excellent Security Improvements
The primary security changes in src/codeweaver/core/di/container.py are well-designed and properly implemented:
1. Removed type Builtin (lines 148-162)
- β
Correctly prevents exploitation via
type('X', (object,), {}) - β Well-documented with clear security rationale
- β Not needed for legitimate type annotation resolution
2. Consistent Error Handling (lines 141-144)
- β
Changed from mixed behavior to uniform
TypeError β None - β
Updated return type annotation to
Any | None - β More predictable and safer
3. Test Coverage (tests/di/test_container_security.py)
- β
New
test_type_builtin_not_exploitablevalidates the fix - β Proper pytest markers and documentation
- β Follows project conventions
4. NoneType Handling (container.py:644-645)
- β Good defensive addition for edge cases
β οΈ Additional Observations
Positive Changes
- β
Consistent
has_package()usage (removed unnecessaryis not None) - β
Correct logic fix in
backup_models.py:57(changedandtoor) - β Import guards with try/except for robustness
- β
Enum comparisons using
isoperator (best practice) - β
Cleaner
FastEmbedClientOptionsusingmodel_copy()
Scope Note
This PR touches 51 files with multiple concerns:
- Security hardening (primary)
- Pickle β JSON cache migration
- Dependency updates
- Workflow changes
- Various lint fixes
While all changes appear reasonable, consider splitting future PRs for easier review and rollback.
π Action Items
Before merging:
- β Security improvements - ready to merge
- β Fix import path in
mteb_to_codeweaver.pyline 52 - β Fix JSON syntax in
hf-models.json(remove duplicate key and orphaned object) β οΈ Verify node_types_cache format migration doesn't break existing deploymentsβ οΈ Run full test suite
Verdict
Security hardening: Excellent work - well-designed, properly tested, and ready to merge.
Blocking issues: Two critical bugs prevent merging. Once the import path and JSON syntax are fixed, this PR will be ready to approve.
The security improvements are solid and address the vulnerability appropriately. Great job on the implementation and test coverage!
The
_safe_eval_typemethod had two issues:typewas included insafe_builtins(unnecessary and exploitable viatype('X', (object,), {})withast.Callallowed), and error handling was inconsistent βSyntaxErrorreturnedNonewhileTypeErrorfromTypeValidatorpropagated up.Changes
src/codeweaver/core/di/container.pytypefromsafe_builtins; added inline comment explaining the exclusionTypeValidatornow raisesTypeError(perTRY004); wrapped intry/except TypeError: return Noneβ method now consistently returnsNonefor all invalid inputsAny | Noneand docstring to matchtests/di/test_container_security.pyfrom __future__ import annotations,pytestmark, and per-function docstrings to match project conventionstest_type_builtin_not_exploitableverifyingtype('X', (object,), {})is blockedPre-existing lint fixes (required to unblock push after rebase onto
main)node_type_parser.py:assert adapter is not Noneβif adapter is None: return Falseproviders/config/clients/agent.py: removed stalety: ignore[invalid-method-override]providers/config/clients/multi.py: removed unused# type: ignorefromelsebranchesscripts/model_data/mteb_to_codeweaver.py: added# ty: ignore[invalid-type-form]toLiteral[Provider.X]members (ty doesn't recognizeBaseEnumsubclass members as validLiteralargs)π Connect Copilot coding agent with Jira, Azure Boards or Linear to delegate work to Copilot in one click without leaving your project management tool.